Two-step TAG parsing revisited
نویسندگان
چکیده
Based on the work in (Poller, 1994) and a minor assumption about a normal form for TAGs, we present a highly simplified version of the twostep parsing approach for TAGs which allows for a much easier analysis of run-time and space complexity. It also snggests how restrictions on the grammars might result in improvements in run-time complexity. The main advantage of a two-step parsing system shows in practical applications like Verbmobil (Bub et al., 1997) where the parser must look at multiple hypotheses supplied by a speech recognizer (encoded in a word hypotheses lattice) and filter out illicit hypotheses as early as possible. The first (context-free) step of our parser filters out some illicit hypotheses fast (O(n3 )); the constructed parsing matrix is then reused for the second step, the complete (O(n6 )) TAG parse.
منابع مشابه
Multiword Expression-Aware A$*$ TAG Parsing Revisited
A? algorithms enable efficient parsing within the context of large grammars and/or complex syntactic formalisms. Besides, it has been shown that promoting multiword expressions (MWEs) is a beneficial strategy in dealing with syntactic ambiguity. The state-of-the-art A? heuristic for promoting MWEs in tree-adjoining grammar (TAG) parsing has certain drawbacks: it is not monotonic and it composes...
متن کاملPLCFRS Parsing Revisited: Restricting the Fan-Out to Two
Linear Context-Free Rewriting System (LCFRS) is an extension of Context-Free Grammar (CFG) in which a non-terminal can dominate more than a single continuous span of terminals. Probabilistic LCFRS have recently successfully been used for the direct data-driven parsing of discontinuous structures. In this paper we present a parser for binary PLCFRS of fan-out two, together with a novel monotonou...
متن کاملInterfacing Sentential and Discourse TAG-based Grammars
Tree-Adjoining Grammars (TAG) have been used both for syntactic parsing, with sentential grammars, and for discourse parsing, with discourse grammars. But the modeling of discourse connectives (coordinate conjunctions, subordinate conjunctions , adverbs, etc.) in TAG-based formalisms for discourse differ from their modeling in sentential grammars. Because of this mismatch, an intermediate, not ...
متن کاملLambek Grammars, Tree Adjoining Grammars and Hyperedge Replacement Grammars
Two recent extension of the nonassociative Lambek calculus, the LambekGrishin calculus and the multimodal Lambek calculus, are shown to generate class of languages as tree adjoining grammars, using (tree generating) hyperedge replacement grammars as an intermediate step. As a consequence both extensions are mildly context-sensitive formalisms and benefit from polynomial parsing algorithms.
متن کاملAntecedent Recovery: Experiments with a Trace Tagger
This paper explores the problem of finding non-local dependencies. First, we isolate a set of features useful for this task. Second, we develop both a two-step approach which combines a trace tagger with a state-of-the-art lexicalized parser and a one-step approach which finds nonlocal dependencies while parsing. We find that the former outperforms the latter because it makes better use of the ...
متن کامل